Safe Policies for Reinforcement Learning via Primal-Dual Methods

نویسندگان

چکیده

In this article, we study the design of controllers in context stochastic optimal control under assumption that model system is not available. This is, aim to a Markov decision process which do know transition probabilities, but have access sample trajectories through experience. We define safety as agent remaining desired safe set with high probability during operation time. The drawbacks formulation are twofold. problem nonconvex and computing gradients constraints respect policies prohibitive. Hence, propose an ergodic relaxation following advantages. 1) guarantees maintained case episodic tasks they hold until given time horizon for continuing tasks. 2) constrained optimization despite its nonconvexity has arbitrarily small duality gap if parametrization controller rich enough. 3) Lagrangian associated learning can be computed using standard reinforcement results approximation tools. Leveraging these advantages, exploit primal-dual algorithms find optimal. test proposed approach navigation task continuous domain. numerical show our algorithm capable dynamically adapting policy environment required levels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs o...

متن کامل

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity p...

متن کامل

Safe Reinforcement Learning via Shielding

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive sys...

متن کامل

Safe Reinforcement Learning via Formal Methods Toward Safe Control Through Proof and Learning

Formal verification provides a high degree of confidence in safe system operation, but only if reality matches the verified model. Although a good model will be accurate most of the time, even the best models are incomplete. This is especially true in Cyber-Physical Systems because high-fidelity physical models of systems are expensive to develop and often intractable to verify. Conversely, rei...

متن کامل

Safe exploration for reinforcement learning

In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can lead to damage and thus must be avoided. We introduce the concepts of a safety function for determining a state’s safety degree and that of a backup policy that is able to lead the controlled system from a critical st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2023

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2022.3152724